Test suites

Test suites allow you to validate your agent’s behavior across various scenarios, ensuring that recent changes to prompts, variables, or logic don’t introduce regressions and that your agent consistently produces the expected responses.

They leverage Large Language Models (LLMs) to simulate real conversations and automatically evaluate the agent’s responses against the expected outcomes. This enables comprehensive, end-to-end dialogue testing-helping you identify issues in prompt design, context handling, and multi-turn reasoning that traditional scripted tests might overlook.

A test suite consists of multiple individual tests, which can be run separately or in batches. Running multiple batch iterations allows for more thorough testing and performance evaluation.